Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9703 / 000100_owner-urn-ietf _Mon Mar 31 02:36:37 1997.msg < prev next >

Wrap

Internet Message Format | 1997-04-01 | 17KB

Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id CAA12892 for urn-ietf-out; Mon, 31 Mar 1997 02:36:37 -0500 (EST) Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id CAA12887 for <urn-ietf@services.bunyip.com>; Mon, 31 Mar 1997 02:36:32 -0500 (EST) Received: from sdgmail.ncsa.uiuc.edu by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA28575 (mail destined for urn-ietf@services.bunyip.com); Mon, 31 Mar 97 02:36:28 -0500 Received: from void.ncsa.uiuc.edu (void [141.142.103.20]) by ncsa.uiuc.edu (8.8.5/8.8.5) with ESMTP id BAA10602; Mon, 31 Mar 1997 01:36:29 -0600 (CST) From: Daniel LaLiberte <liberte@ncsa.uiuc.edu> Received: (from liberte@localhost) by void.ncsa.uiuc.edu (8.8.2/8.8.2) id BAA02687; Mon, 31 Mar 1997 01:36:24 -0600 (CST) Date: Mon, 31 Mar 1997 01:36:24 -0600 (CST) Message-Id: <199703310736.BAA02687@void.ncsa.uiuc.edu> To: "Ron Daniel, Jr." <rdaniel@acl.lanl.gov> Cc: Dan Connolly <connolly@w3.org>, urn-ietf@bunyip.com Subject: [URN] Relative URNs considered harmful In-Reply-To: <3.0.32.19970329143723.0096ca00@acl.lanl.gov> References: <3.0.32.19970329143723.0096ca00@acl.lanl.gov> Sender: owner-urn-ietf@Bunyip.Com Precedence: bulk Reply-To: Daniel LaLiberte <liberte@ncsa.uiuc.edu> Errors-To: owner-urn-ietf@Bunyip.Com Ron Daniel, Jr. writes: > During the earlier set of discussions my view on relative > URNs changed from "unnecessary but probably harmless" to > "unnecessary and probably harmful". Let me explain why. My position has been that support for relative URNs is necessary, but not for the reasons you argue against. First, I'll argue against your position and then put forward my own. > Unnecessary: > This is not the key point of controversy, but let me dispose of > it quickly. > > Relative URLs came about for two reasons. The first was concision, > a reasonable consideration since we were all creating HTML my hand > using our editors. (Many of us still do, so this point has some > weight). The second reason was to make it easier to move connected > sets of resources from one site to another. If the links were relative, > there was less patching to do. > > Since URNs are defined to be location independent, it is not the > documents that need to be edited if we want to move a bunch of > resources from one location to another. Therefore, the most important > reason for relative URLs does not apply to URNs. Given the continued existence of relative URLs, I would agree that relative URNs are not as necessary regarding the niche filled by relative URLs. However, you seem to be assuming that relative URLs will be phased out to be replaced by URNs once they are available. I don't think so for a couple reasons. First, what you called concision will still be significant, depending on the lengths of URNs. There is also the matter of other kinds of convenience. If one must go through some naming authority to first assign names for all the parts of a structured document, this would be more inconvenient than many people will tolerate. If relative URLs are easier to create and use than URNs, then people will continue to use them. Second, relative URLs will not lose their location-independence just because URNs exist. Once a document is found via some identifier, whether a location or a name, the context of that document provides all the info needed to find the other documents referenced via relative URLs. See below for more on how that is done. But that is an argument for why relative URLs will continue to exist. Part of the argument for why relative URNs will be useful is similar. Certainly a relative URN can be just as concise as a relative URL. And given a context for interpretation of a relative URN, there is no problem in their use as another form of "location independent" identifiers. But there is another reason why relative URNs will be useful which has to do with scalability. See below for that argument. > Probably Harmful: > This will be more contentious, but here goes. This argument depends > on an observation: > > Observation - a resource may have more than one URN. It is good to remind people that a resource may have more than one URN. Some people have the mistaken belief that URNs will solve the problem of knowing, by simple inspection, whether two different identifiers are for the same resource. > We have used the weather map as one example of a resource having, at > least temporarily, two URNs. I like that example. > [...] > All of these identifiers are reasonable candidates for URN namespaces. > If we were to follow Dan's suggestion of using "/" for hierarchy, For clarification, both Dan Connolly and myself have suggested this. > (and our already-agreed upon use of "urn:") we would have BTW, "urn:" *was* optional up until some unknown time. Last mention I heard of this optionality was a couple months ago when Karen Sollins was arguing for why it causes ambiguity in the resolution process. My response to that got too close to the URN-URL debate and I was told to stop. So when was the official decision made to require "urn:"? > urn:isbn:0/679/45446/2 > urn:upc:9/780679/454465 > urn:lc:96/34802 > > The problem with relative URNs is that there is no consistent hierarchy > across all identification schemes. There doesn't need to be one hierarchy for all schemes. In fact, it is usual for there to be separate hierarchies for each scheme, if there is a hierarchy at all, and only rarely will there be any overlap, unless there is some transition in progress. Clearly relative URNs cannot be relative to multiple incompatible URN contexts at the same time. But this is not a problem for relative URNs because the *one* context for their interpretation should be known to the author or provider of the document, and this context should be made known to the client. > Assuming Le Carre's work referred > to another using the relative identifier "4", what does that mean? It would mean the one thing that is appropriate, and nothing more. More on that below. > Is it > urn:isbn:0/679/45446/4 // An illegal ISBN since we have only > // munged the check character. So the ISBN hierarchy we guessed at doesn't work because of embedded check chars. One of two things can be done. Give up on ISBN as a hierarchy, or further map the name space to remove the check character to make it a proper hierarchy. Or perhaps the check character can always be appended as part of the last component, not the last component itself. e.g. 45446,4 or whatever is appropriate - I don't grok the ISBN notation. But perhaps ISBN doesnt help as a hierarchy anyway because higher levels (prefixes of the path) never represent reasonable, useful collections. If there is hardly ever any reason to refer to other members of a collection with a relative URI, then there is no point in making the name space hierarchical. More on semantic organization below. > urn:upc:9/780679/4 // This might work, except that once again > // the check character has probably been munged. > // I haven't read the UPC rules lately, I think > // the initial "9" is the check character. Then the name space would have to be mapped to something that worked hierarchically. But, again, only if it is useful. > urn:lc:96/4 // this might work Seems a bit short for a unique ID. > Other possibilities are to take the URL that was used to fetch the resource > (assuming there was one) and use the relative identifier "4" in conjunction > with it. If there was a URL involved in actually fetching the resource, and if the relative URI rules are followed and it is determined that that URL is what should be used for the base, then so be it. > The difficulty of correctly dealing with check characters is only one > of the problems with relative URNs. The big point is that there is no > uniform hierarchy across all namespaces, and without one it is unsafe > to do relative URI processing. We have to know the original identifier the > new one is relative to. We have to know the intended identifier or context that any relative identifier is relative to. Yes. This is not a problem. This is by design. Even without relative URNs, the same potential problem arises for relative URLs when the same resource may be accessed by different URLs, each with incompatible hierarchies. For example, a symbolic link to a document may create a second access path to the same document, and relative URLs within the document may be correct relative to one path but not the other. (I can be more explicit in this example if you want.) The solution is that in such a situation, the document should have a BASE specification of some kind to say which path is correct. > The last time we took up this topic, Dan LaLiberte presented the very > nice set of rules that are used in relative URL processing to answer > this question. Lets go through those and see if they apply to URNs. And I clarified the rules regarding chains of indirections. > From: Daniel LaLiberte > Date: Fri, 31 Jan 1997 16:34:21 -0600 (CST) > >... finding the base URI ... > >Use the first one that succeeds: > > > >1. Use the explicit base URI from the document content, if any. > >2. Use the explicit base URI from the encapsulating entity, if any. > > (e.g. http response message, another document, etc) > >3. Use the URI used to retrieve the entity, if any. > >4. Otherwise the base URI is undefined. > > > >Step 3 should be clarified: If there is no explicit base URI > >found by step 1 and 2, we should use the *last* URI used > >to retrieve the entity, not the first or some intermediate. This > >applies both for a chain of URL redirections or for a URN that is > >resolved into a URL. Roy Fielding pointed this out to me when I > >thought it should be the first URI used, or perhaps the last > >permanent redirect. > > Step one seems dodgy for our example, we have the equivalent of three > BASE tags. (Don't tell me there should be only one, the book was printed > with three.) I can only tell you there must be only one. We do not have the equivalent of three BASE tags. If you still think so, then you haven't understood the rules. If there are multiple URNs that a document is accessed by, and the document contains relative URNs, then the author or creator of the document must have one of those name spaces in mind when using relative URNs. Which one it is should be specified in the BASE tag. This doesn't mean the other URNs will not work. They will work, and when a relative URN is encountered, the appropriate context will be known and used instead of the URN you accessed the document by. Problem solved. > Steps 2 and 3 suffer from the same problem. Over long time scales, there is > no telling what sort of URI will be used to fetch the thing. Assuming > that it will have the same hierarchy as the original seems dangerous. Maybe it is now clear where you misunderstood. If steps 2 or 3 are used, then the same answer as above applies. If relative URNs are used by an author knowing that they are intended to be persistent, then the author is essentially promising that the hierarchy implied by those relative URNs will persist. Relative URNs correspond to full URNs and a promise of persistence is a promise either way. Note that this concern about a possibly changing hierarchy should be divorced from issues of filesystem organization, as I argued in earlier messages. I'll dig that out if you want it repeated. -------------- Now for the main reason that I believe relative URNs should be supported. First, the reason is not so much for support of relative URNs themselves. As much as they are useful, they are not essential since full URNs and relative URLs can be used too. The real reason is that because relative URNs (of the hierarchical kind) require a publically visible hierarchy, that same hierarchy can be used by clients to support more scalable resolution. I'll explain. A principle I stated earlier that you agreed to is that with an ever increasing number of users, more of the work of resolving identifiers must be done by clients or servers near the clients. If, in attempting to resolve a name, we have no clues about where to start other than at the top of the URN space, because the naming authority is not previously known, then we must go to the top to find out about the naming authority. Once we know where that naming authority is and have info on its associated resolvers, we can resolve other URNs that have the same naming authority starting with the same resolvers. But if there is no subdivision of this naming authority's name space that is visible to clients, then all future resolutions of names in the name space must go through the very same resolvers. Those resolvers will become increasingly busy in proportion to the number of users and URNs in that space. Caches of documents will only help for those individual documents, but every document in the same space is independent of every other document since there are no collections or subspaces within the space to start the resolution from. With a hierarchically structured name space, resolution of an identifier can proceed by first looking up information in local caches about resolvers or RDSs for the most specific known subspace corresponding to some prefix of the identifier. Only then do we need to ask remote servers to resolve the remainder, or look up resolvers, or whatever needs to be done. Maybe you didn't intend to argue against hierarchical name spaces but only against relative URNs. Well, once you have hierarchical name spaces, then relative URNs come for free. The only overhead is the same as for relative URLs - being clear about the context. There may actually be another way to get a similar kind of scalable, mostly local resolution without a visible hierarchy (and thus without support for relative URNs). Instead of extracting structure from the identifier, the client must do a sequence of transformations of the identifier into intermediate states of some kind until ultimately the resource is found. If the intermediate states are also identifiers, then this is equivalent to a sequence of indirections. These transformations can be done locally based on locally cached code that directs the transformation (such as regular expressions or Java appets). But to be scalable, it must be the case that we can reuse previously retrieved transformation code, otherwise we will be always asking remote servers for the appropriate code, and nothing will have been gained. This seems possible, if the name space is effectively, though not visibly, organized, but there is another overriding concern. In order for any caching to useful in the first place, there must be some locality of reference to take advantage of. Locality of reference means that the next reference to something new is similar to previous references. "Similar to" means semantically related or some how related in a way that is meaningful to the human user. You don't typically read publication #97-1489 just because you have read #97-1488 unless there is some relationship between the two. If your next reference is completely independent from previous references, there is likely to be no information in local caches to help you. So if there is to be some semantic organization of the name space to promote caching, then hierarchy is one common way things are organized. The hierarchy may not be visible in identifiers, and may only be in an invisible hierarchy of transformations, but I don't understand the advantage of always and only remaining invisible. If you are thinking along these lines, please explain. A matrix space is another way to organize things, but I haven't thought enough about it. So yes, I hope we can come to consensus on the issue of relative URNs. We should agree that they must be allowed, along with allowing hierarchial name spaces. Until there is a strong enough argument for how non-hierarchical name spaces can support scalable resolution, I would hesitate to disallow hierarchical name spaces. -- Daniel LaLiberte (liberte@ncsa.uiuc.edu) National Center for Supercomputing Applications http://union.ncsa.uiuc.edu/~liberte/